Unsupervised Morpheme Discovery with Allomorfessor

نویسندگان

  • Sami Virpioja
  • Oskar Kohonen
چکیده

We describe Allomorfessor, which extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding small modifications, called mutations, for them. Using Maximum a Posteriori estimation, the model is able to decide the amount and types of the mutations needed for the particular language. The method is evaluated in Morpho Challenge 2009.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allomorfessor: Towards Unsupervised Morpheme Analysis

We extend the unsupervised morpheme segmentation method Morfessor Baseline to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. Our method discovers common base forms for allomorphs from an unannotated corpus. We evaluate the method by participating in the Morpho Challenge 2008 competition 1, where inferred analyses are compared against...

متن کامل

Unsupervised Morpheme Discovery with Ungrade

In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step p...

متن کامل

Morpho Challenge 2005-2010: Evaluations and Results

Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...

متن کامل

Morpho Challenge competition 2005-2010: Evaluations and results

Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...

متن کامل

Unsupervised Discovery of Persian Morphemes

This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009